Robots.txt is a text file webmasters create to instruct robots how to crawl and index pages on their website.
Robots.txt to be placed in the top-level directory of a web server in order to be useful. Example: http://www.example.com/robots.txt
Our Robots.txt file is what tells the search engines which pages to access and index on our website on which pages not to. For example, if you specify in your Robots.txt file that you don’t want the search engines to be able to access your thank you page, that page won’t be able to show up in the search results and web users won’t be able to find it. Keeping the search engines from accessing certain pages on your site is essential for both the privacy of our site and for your SEO. This article will explain why this is and provide you with the knowledge of how to set up a good Robots.txt file.
Robots.txt files are useful if we want
• If we want search engines to ignore any duplicate pages on your website
• If we don’t want search engines to index your internal search results pages
• If we don’t want search engines to index certain areas of your website or a whole website
• If we don’t want search engines to index certain files on your website (images, PDFs, etc.)
• If we want to tell search engines where your sitemap is located
As mentioned above, the robots.txt file is a simple text file. Open a simple text
editor to create it. The content of a robots.txt file consists of so-called "records".
A record contains the information for a special search engine. Each record consists
of two fields: the user agent line and one or more Disallow lines. Here's an example:
User-agent: googlebot
Disallow: /cgi-bin/
This robots.txt file would allow the "googlebot", which is the search engine spider of Google, to retrieve every page from your site except for files from the "cgi-bin" directory. All files in the "cgi-bin" directory will be ignored by googlebot.The Disallow command works like a wildcard. If you enter User-agent: googlebot
Disallow: /support
Both "/support-desk/index.html" and "/support/index.html" as well as all other files in the "support" directory would not be indexed by search engines.
It’s important to update our Robots.txt file if we add pages, files or directories to our site that we don’t wish to be indexed by the search engines or accessed by web users. This will ensure the security of our website and the best possible results with our search engine optimization.
Leave Comment